我们提出了Lidargen,这是一种新型,有效且可控的生成模型,可产生逼真的LIDAR点云感觉读数。我们的方法利用强大的得分匹配基于能量的模型,并将点云生成过程作为随机降解过程在等应角视图中。该模型使我们能够采样具有保证的物理可行性和可控性的多样化和高质量点云样本。我们验证方法对挑战性Kitti-360和Nuscenes数据集的有效性。定量和定性结果表明,与其他生成模型相比,我们的方法产生的样本更现实。此外,LIDARGEN可以在不进行重新培训的情况下在输入上进行样本云。我们证明我们所提出的生成模型可直接用于致密激光点云。我们的代码可在以下网址找到:https://www.zyrianov.org/lidargen/
translated by 谷歌翻译
众所周知,很难拥有一个可靠且强大的框架来将多代理深入强化学习算法与实用的多机器人应用联系起来。为了填补这一空白,我们为称为MultiroBolearn1的多机器人系统提出并构建了一个开源框架。该框架构建了统一的模拟和现实应用程序设置。它旨在提供标准的,易于使用的模拟方案,也可以轻松地将其部署到现实世界中的多机器人环境中。此外,该框架为研究人员提供了一个基准系统,以比较不同的强化学习算法的性能。我们使用不同类型的多代理深钢筋学习算法在离散和连续的动作空间中使用不同类型的多代理深钢筋学习算法来证明框架的通用性,可扩展性和能力。
translated by 谷歌翻译
关于使用物理信息的神经网络求解微分方程的广泛研究。尽管这种方法在许多情况下已被证明是有利的,但主要批评在于它缺乏分析误差范围。因此,它不如传统的同行(例如有限差异方法)可信。本文表明,可以在数学上得出在一类微分方程线性系统上训练的物理信息的神经网络的明确误差界限。更重要的是,评估此类误差界限仅需要评估感兴趣域上的微分方程残留无限规范。我们的工作显示了网络残差之间的联系,该网络残差被称为损耗函数,以及解决方案的绝对误差,这通常是未知的。我们的方法是半生态学的,并且独立于对网络的实际解决方案或复杂性或架构的了解。使用在线性ODE和线性ODES系统上制成的解决方案的方法,我们从经验上验证了错误评估算法,并证明实际误差严格存在于我们派生的界限内。
translated by 谷歌翻译
复发性神经网络(RNN)在顺序数据处理中取得了巨大的成功。但是,直接解释和验证RNN的行为是非常具有挑战性的。为此,已经做出了许多努力,从RNN中提取有限的自动机。现有的方法(例如精确学习)有效地提取有限状态模型来表征正式语言的RNN状态动力学,但在处理自然语言的可扩展性方面受到限制。可分配的自然语言的组成方法的提取精度不足。在本文中,我们确定了过渡性稀疏问题,从而严重影响提取精度。为了解决这个问题,我们提出了一种过渡规则提取方法,该方法可扩展到自然语言处理模型,并有效提高提取精度。具体而言,我们提出了一种经验方法来补充过渡图中缺失的规则。此外,我们进一步调整了过渡矩阵,以增强提取的加权有限自动机(WFA)的上下文感知能力。最后,我们提出了两种数据增强策略,以跟踪目标RNN的动态行为。两个流行的自然语言数据集的实验表明,我们的方法可以从RNN中提取自然语言处理的WFA,其精度比现有方法更好。
translated by 谷歌翻译
犯罪预测对于公共安全和资源优化至关重要,但由于两个方面而言,这是非常具有挑战性的:i)犯罪活动的刑事模式的动态,犯罪事件在空间和时间域之间不均匀分布; ii)延时依赖于不同类型的犯罪(例如,盗窃,抢劫,攻击,损害),其揭示了犯罪的细粒度语义。为了解决这些挑战,我们提出了空间时间顺序超图网络(ST-SHN),以集体编码复杂的犯罪空间模式以及潜在的类别明智犯罪语义关系。具体而言,在长期和全局上下文下处理空间 - 时间动态,我们设计了一个具有超图学习范例的集成的图形结构化消息传递架构。为了在动态环境中捕获类别方面的犯罪异构关系,我们介绍了多通道路由机制,以了解犯罪类型的时间不断发展的结构依赖性。我们对两个现实世界数据集进行了广泛的实验,表明我们所提出的ST-SHN框架可以显着提高与各种最先进的基线相比的预测性能。源代码可用于:https://github.com/akaxlh/st-hn。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译